phylopathThis vignette gives a quick overview of the package by recreating the phylogenetic path analysis described in:
Gonzalez-Voyer A & von Hardenberg A. 2014. An Introduction to Phylogenetic Path Analysis. Chapter 8. In: Garamszegi LZ (ed.), Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. pp. 201-229. Springer-Verlag Berlin Heidelberg.
You can find this book chapter online, for free. For an introduction to the methodology, as well as the data, see the wonderful book chapter.
Specifically, we recreate the Rhinogrades example here. The data used has been included in this package.
Following figure 8.7, we first create all 9 causal models using the DAG function. This function uses regression equation (or formulas) to express the hypothesized relationships in the models. The easiest way to create this is by taking each node (a variable), putting it on the left-hand side of a tilde (~), and putting its causal parents on the right-hand side.
Note: any unconnected nodes can be added using var ~ var, but we won’t need this for this example.
library(phylopath)
models <- list(
one = DAG(LS ~ BM, NL ~ BM, DD ~ NL, RS ~ DD),
two = DAG(LS ~ BM, NL ~ BM, DD ~ NL, RS ~ LS + DD),
three = DAG(LS ~ BM, NL ~ BM, DD ~ NL, RS ~ NL),
four = DAG(LS ~ BM, NL ~ BM, DD ~ NL, RS ~ BM + NL),
five = DAG(LS ~ BM, NL ~ BM, DD ~ NL, RS ~ BM + NL + DD),
six = DAG(LS ~ BM, NL ~ BM + RS, DD ~ NL, RS ~ BM),
seven = DAG(LS ~ BM, NL ~ BM + RS, DD ~ NL, RS ~ LS + BM),
eight = DAG(LS ~ BM, NL ~ BM + RS, DD ~ NL),
nine = DAG(LS ~ BM, NL ~ BM + RS, DD ~ NL, RS ~ LS)
)The DAG function simply produces a matrix that summarizes the connections between the variables.
models$one## BM NL DD RS LS
## BM 0 1 0 0 1
## NL 0 0 1 0 0
## DD 0 0 0 1 0
## RS 0 0 0 0 0
## LS 0 0 0 0 0
## attr(,"class")
## [1] "matrix" "DAG"
Note that the models are of class matrix as well as of class DAG. This means we can have special DAG methods.
For example, it is good to check if the DAG looks like you were expecting. Simply plot one of the models to inspect it visually.
plot(models$one)Now that we have the models, we can perform the path analysis using the phylo_path function. For this we will need a data set, included in this package as rhino, as well as a phylogenetic tree, rhino_tree.
By default, phylo_path uses Pagel’s “lambda” correlation structure (ape::corPagel), but if you want, for example, to use a simple Brownian motion model, you can supply ape::corBrownian instead.
result <- phylo_path(models, data = rhino, tree = rhino_tree)The result we end up with is a phylo_path object. Simply printing it gives us a table with statistics for all the models, and a plot will appear with the average model.
result## model k q C p CICc delta_CICc l w
## 1 eight 6 9 7.111 0.850 27.111 0.000 1.000 0.375
## 2 six 5 10 5.563 0.851 28.035 0.924 0.630 0.236
## 3 four 5 10 5.735 0.837 28.207 1.096 0.578 0.217
## 4 five 4 11 4.421 0.817 29.421 2.310 0.315 0.118
## 5 seven 4 11 7.139 0.522 32.139 5.028 0.081 0.030
## 6 nine 5 10 10.238 0.420 32.710 5.599 0.061 0.023
## 7 three 6 9 31.466 0.002 51.466 24.356 0.000 0.000
## 8 one 6 9 64.388 0.000 84.388 57.277 0.000 0.000
## 9 two 5 10 64.344 0.000 86.816 59.705 0.000 0.000
The numbers and width of the arrow represent path coefficients. In this case, all paths are green since all relationships are positive.
You can access the DAG for the average model with result$average_model. You can also plot the best model instead:
plot(result$best_model)Finally, you can access the conditional independencies and their associated p-values as well. This can be useful if you want to know why a certain model was rejected.
result$p_vals$one## d_sep p
## 1 BM ~ DD + NL 3.080168e-01
## 2 BM ~ RS + DD 1.502316e-01
## 3 NL ~ RS + BM + DD 4.456180e-13
## 4 NL ~ LS + BM 9.914635e-01
## 5 DD ~ LS + NL + BM 5.124034e-01
## 6 RS ~ LS + DD + BM 9.956174e-01
For model 1 it seems that the third conditional independence statement was violated.